智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Bridging Component Learning with Degradation Modelling for Blind Image Super-Resolution

Yixuan Wu , Feng Li , Huihui Bai , Weisi Lin , Runmin Cong , Yao Zhao

分类：计算机视觉

2022-12-03

Convolutional Neural Network (CNN)-based image super-resolution (SR) has exhibited impressive success on known degraded low-resolution (LR) images. However, this type of approach is hard to hold its performance in practical scenarios when the degradation process is unknown. Despite existing blind SR methods proposed to solve this problem using blur kernel estimation, the perceptual quality and reconstruction accuracy are still unsatisfactory. In this paper, we analyze the degradation of a high-resolution (HR) image from image intrinsic components according to a degradation-based formulation model. We propose a components decomposition and co-optimization network (CDCN) for blind SR. Firstly, CDCN decomposes the input LR image into structure and detail components in feature space. Then, the mutual collaboration block (MCB) is presented to exploit the relationship between both two components. In this way, the detail component can provide informative features to enrich the structural context and the structure component can carry structural context for better detail revealing via a mutual complementary manner. After that, we present a degradation-driven learning strategy to jointly supervise the HR image detail and structure restoration process. Finally, a multi-scale fusion module followed by an upsampling layer is designed to fuse the structure and detail features and perform SR reconstruction. Empowered by such degradation-based components decomposition, collaboration, and mutual optimization, we can bridge the correlation between component learning and degradation modelling for blind SR, thereby producing SR results with more accurate textures. Extensive experiments on both synthetic SR datasets and real-world images show that the proposed method achieves the state-of-the-art performance compared to existing methods.

translated by 谷歌翻译

Learning Detail-Structure Alternative Optimization for Blind Super-Resolution

Feng Li , Yixuan Wu , Huihui Bai , Weisi Lin , Runmin Cong , Yao Zhao

分类：计算机视觉

2022-12-03

Existing convolutional neural networks (CNN) based image super-resolution (SR) methods have achieved impressive performance on bicubic kernel, which is not valid to handle unknown degradations in real-world applications. Recent blind SR methods suggest to reconstruct SR images relying on blur kernel estimation. However, their results still remain visible artifacts and detail distortion due to the estimation errors. To alleviate these problems, in this paper, we propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR. Specifically, in our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures. The DSMM consists of two components: a detail restoration unit (DRU) and a structure modulation unit (SMU). The former aims at regressing the intermediate HR detail reconstruction from LR structural contexts, and the latter performs structural contexts modulation conditioned on the learned detail maps at both HR and LR spaces. Besides, we use the output of DSMM as the hidden state and design our DSSR architecture from a recurrent convolutional neural network (RCNN) view. In this way, the network can alternatively optimize the image details and structural contexts, achieving co-optimization across time. Moreover, equipped with the recurrent connection, our DSSR allows low- and high-level feature representations complementary by observing previous HR details and contexts at every unrolling time. Extensive experiments on synthetic datasets and real-world images demonstrate that our method achieves the state-of-the-art against existing methods. The source code can be found at https://github.com/Arcananana/DSSR.

translated by 谷歌翻译

A heterogeneous group CNN for image super-resolution

Chunwei Tian , Yanning Zhang , Wangmeng Zuo , Chia-Wen Lin , David Zhang , Yixuan Yuan

分类：计算机视觉

2022-09-26

卷积神经网络（CNN）通过深度体系结构获得了出色的性能。但是，这些CNN在复杂的场景下通常对图像超分辨率（SR）实现较差的鲁棒性。在本文中，我们通过利用不同类型的结构信息来获得高质量图像，提出了异质组SR CNN（HGSRCNN）。具体而言，HGSRCNN的每个异质组块（HGB）都采用含有对称组卷积块和互补的卷积块的异质体系结构，并以平行方式增强不同渠道的内部和外部关系，以促进富裕类型的较富裕类型的信息，。为了防止出现获得的冗余功能，以串行方式具有信号增强功能的完善块旨在过滤无用的信息。为了防止原始信息的丢失，多级增强机制指导CNN获得对称架构，以促进HGSRCNN的表达能力。此外，开发了一种平行的向上采样机制来训练盲目的SR模型。广泛的实验表明，在定量和定性分析方面，提出的HGSRCNN获得了出色的SR性能。可以在https://github.com/hellloxiaotian/hgsrcnn上访问代码。

translated by 谷歌翻译

DeepGraphONet: A Deep Graph Operator Network to Learn and Zero-shot Transfer the Dynamic Response of Networked Systems

Yixuan Sun , Christian Moya , Guang Lin , Meng Yue

分类：机器学习

2022-09-21

本文开发了一个深图运算符网络（DeepGraphonet）框架，该框架学会了近似具有基础子图形结构的复杂系统（例如电网或流量）的动力学。我们通过融合（i）图形神经网络（GNN）来利用空间相关的图形信息和（ii）深操作符网络〜（deeponet）近似动态系统的解决方案操作员的能力来构建深图载体。然后，所得的深图载体可以通过观察图形状态信息的有限历史来预测给定的短/中期时间范围内的动力学。此外，我们将深图载体设计为独立于解决方案。也就是说，我们不需要以精确/相同的分辨率收集有限的历史记录。此外，为了传播训练有素的Deepgraphonet的结果，我们设计了一种零摄像的学习策略，可以在不同的子图上使用它。最后，对（i）瞬态稳定性预测电网和（ii）车辆系统的交通流量预测问题的经验结果说明了拟议的Deepgraphonet的有效性。

translated by 谷歌翻译

Cooperative Actor-Critic via TD Error Aggregation

Martin Figura , Yixuan Lin , Ji Liu , Vijay Gupta

分类：机器学习

2022-07-25

在分散的合作多机构增强学习中，代理可以彼此汇总信息，以学习最大化团队平均目标功能的政策。尽管愿意与他人合作，但各个代理商可能会直接分享有关其当地状态，奖励和价值功能的信息，这是由于隐私问题而不受欢迎的。在这项工作中，我们引入了一种带有TD错误聚合的分散的参与者批判算法，该算法不违反隐私问题，并假设沟通渠道会受到时间延迟和数据包的删除。通过传输数据的维度来衡量，我们为做出如此薄弱的假设所支付的成本是增加的沟通负担。有趣的是，通信负担仅在图形大小上是二次的，这使得适用于大型网络的算法。我们在减小的步进大小下提供收敛分析，以验证代理最大化团队平均目标函数。

translated by 谷歌翻译

On Data Scaling in Masked Image Modeling

Zhenda Xie , Zheng Zhang , Yue Cao , Yutong Lin , Yixuan Wei , Qi Dai , Han Hu

分类：计算机视觉

2022-06-09

自我监督学习的一个重要目标是使模型预训练能够从几乎无限的数据中受益。但是，一种最近变得流行的方法，即掩盖图像建模（MIM），被怀疑无法从较大的数据中受益。在这项工作中，我们通过广泛的实验打破了这一误解，数据量表从10 \％imagenet-1k到完整的Imagenet-22K，型号的尺寸从4,900万到10亿，培训长度从125k迭代到500k迭代迭代范围不等。我们的研究表明：（i）蒙版的图像建模也要求对较大的数据进行要求。我们观察到，非常大的模型被相对较小的数据过度。（ii）培训的时间长度。接受掩盖图像建模训练的大型模型可以从更多的数据中受益，并具有更长的培训。（iii）预训练中的验证损失是衡量模型在多个任务上进行微调的表现的好指标。该观察结果使我们能够预先评估预训练的模型，而无需对下游任务进行昂贵的试用和错误评估。我们希望我们的发现能够从缩放能力方面提高对蒙版图像建模的理解。

translated by 谷歌翻译

Image Super-resolution with An Enhanced Group Convolutional Neural Network

Chunwei Tian , Yixuan Yuan , Shichao Zhang , Chia-Wen Lin , Wangmeng Zuo , David Zhang

分类：计算机视觉

2022-05-29

具有强大学习能力的CNN被广泛选择以解决超分辨率问题。但是，CNN依靠更深的网络体系结构来提高图像超分辨率的性能，这可能会增加计算成本。在本文中，我们提出了一个增强的超分辨率组CNN（ESRGCNN），具有浅层架构，通过完全融合了深层和宽的通道特征，以在单图超级分辨率中的不同通道的相关性提取更准确的低频信息（ SISR）。同样，ESRGCNN中的信号增强操作对于继承更长途上下文信息以解决长期依赖性也很有用。将自适应上采样操作收集到CNN中，以获得具有不同大小的低分辨率图像的图像超分辨率模型。广泛的实验报告说，我们的ESRGCNN在SISR中的SISR性能，复杂性，执行速度，图像质量评估和SISR的视觉效果方面超过了最先进的实验。代码可在https://github.com/hellloxiaotian/esrgcnn上找到。

translated by 谷歌翻译

Finite-Time Error Bounds for Distributed Linear Stochastic Approximation

Yixuan Lin , Vijay Gupta , Ji Liu

分类：机器学习

2021-11-24

本文考虑由马尔可夫噪声和一般共识型交互驱动的新型多代理线性随机近似算法，其中每个代理根据其本地随机近似过程演变，这取决于其邻居的信息。代理中的互连结构由时变的指向图描述。虽然已经研究了代理中的互连（至少在期望）中描述了基于协商的随机近似算法的收敛性，但是当互连矩阵简单地是随机时的情况，较少是已知的。对于任何相关的相互作用矩阵是随机的均匀强连接的图形序列，纸张导出平均误差上的有限时间界限，定义为算法从相关常微分方程的独特平衡点偏差。对于互连矩阵是随机的互连矩阵的情况，平衡点可以是在没有通信的情况下所有代理的局部均衡的任何未指明的凸起组合。考虑具有恒定和时差阶梯尺寸的情况。在需要凸起组合的情况下，任何对相邻代理之间的直平均值和相互作用可以是单向的，因此纸张不能以分布式方式实现双随机矩阵，提出了一种推挽和型分布式随机近似算法，通过利用随机矩阵的共识型算法利用分析和发展推送算法的新颖性，为时变梯度尺寸案例提供了其有限时间绑定。

translated by 谷歌翻译

Swin Transformer V2: Scaling Up Capacity and Resolution

Ze Liu , Han Hu , Yutong Lin , Zhuliang Yao , Zhenda Xie , Yixuan Wei , Jia Ning , Yue Cao , Zheng Zhang , Li Dong

分类：计算机视觉

2021-11-18

我们提出了用于将Swin变压器缩放到3亿参数的技术，并使其能够使用高达1,536美元的图像培训1,536美元。通过缩放容量和分辨率，Swin变压器在四个代表视觉基准上设置新记录：84.0％的Top-1在Imagenet-V2图像分类准确度，63.1 / 54.4盒/掩模地图上的Coco对象检测，59.9 Miou在Ade20K语义细分中，在动力学-400视频动作分类上的86.8％的前1个精度。我们的技术通常适用于缩放视觉模型，这尚未广泛探索为NLP语言模型，部分原因是培训和应用中的困难：1）视觉模型经常面临规模的不稳定问题，2）许多下游愿景任务需要高分辨率图像或窗口，并且目前尚不清楚如何有效地将模型在低分辨率上预先培训到更高分辨率。当图像分辨率高时，GPU存储器消耗也是一个问题。为了解决这些问题，我们提出了几种技术，通过使用Swin Transformer作为案例研究来说明：1）归一化技术和缩放的余弦注意力，提高大视觉模型的稳定性; 2）一种日志间隔的连续位置偏置技术，以有效地将在低分辨率图像和窗口预先训练的模型转移到其更高分辨率的对应物。此外，我们分享了我们的关键实施细节，导致GPU内存消耗的大量节省，从而使得用常规GPU培训大型视觉模型可行。使用这些技术和自我监督的预训练，我们成功培训了强大的3B往返变压器模型，并有效地将其转移到涉及高分辨率图像或窗口的各种视觉任务，实现了各种最先进的准确性基准。

translated by 谷歌翻译